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Abstract. We analyze the spectral properties of correlation matrices between distinct statistical systems. 
Such matrices are intrinsically non symmetric, and lend themselves to extend the spectral analyses usually 
performed on standard Pearson correlation matrices to the realm of complex eigenvalues. We employ some 
recent random matrix theory results on the average eigenvalue density of this type of matrices to distinguish 
between noise and non trivial correlation structures, and we focus on financial data as a case study. Namely, 
we employ daily prices of stocks belonging to the American and British stock exchanges, and look for the 
emergence of correlations between two such markets in the eigenvalue spectrum of their non symmetric 
correlation matrix. We find several non trivial results, also when considering time-lagged correlations over 
short lags, and we corroborate our findings by additionally studying the asymmetric correlation matrix of 
the principal components of our datasets. 



1 Introduction 

A huge number of scientific disciplines, ranging from Physics 
to Economics, often need to deal with statistical systems 
described by a large number of degrees of freedom. Typ- 
ically, it is very interesting, if not crucial, to analyze the 
correlations between the random variables describing such 
degrees of freedom. For this very reason, the development 
of both analytical and numerical tools to tackle the prob- 
lem of correlation analysis is a fundamental topic in Multi- 
variate Statistics. In most practical applications, one usu- 
ally deals with a statistical system described in terms of 
N random variables IZi, . . . , TZn, and the most obvious 
thing to do in order to study such a system is to col- 
lect as many observations as possible of such IZiS. Then, 
assuming the IZiS to be described by a stationary joint 
probability distribution, the observations can be used to 
compute empirical time averages of quantities expressed 
in terms of those variables. So, suppose T equally spaced 
observations have been collected for each variable, and let 
us denote the time t (t = 1 , . . . , T) observation of the ran- 
dom variable TZi (* = b ■ • • , N) as Ru- Quite straightfor- 
wardly, one can collect all such numbers in a N x T matrix 
R whose generic entry reads [R]; t = Ru- The most gen- 
eral correlation structure between the random variables 
TZi would read 



in equation ([T]) can be factorized into its "spatial" and 
temporal parts. Assuming that the random variables IZiS 
have zero mean and unit standard deviation, one could 
then write: 

(R it R jt ,)=C ij 5 tt ,, (2) 

and this will also be the case throughout the rest of this 
paper. In the previous expression, the matrix elements Cij 
(to be collected in a symmetric matrix C) account for the 
cross-correlations amongst all possible pairs of variables 
in the system. On the other hand, the Kronecker delta 
in ^ means that no auto-correlations are present in the 
system. Also, this means that each Cy in equation ^ 
can be estimated as the following time average (where the 
data are assumed to be standardized): 



1 



(3) 



This expression is the very well-known Pearson estimator, 
and all the c^s can be collected in a N x N symmetric 
matrix 



1 rp 

c = — RR , 

T 



(4) 
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where (. . .) denotes the expectation with respect to the 
joint probability density describing the IZiS. However, in 
most practical applications the rather involved structure 
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which represents a "matrix estimator" for the true corre- 
lation matrix C introduced in equation ([T]). So, the prob- 
lem of characterizing the correlation structure of a statis- 
tical system essentially boils down to the estimation of the 
N(N — 1)/ 2 independent entries of its correlation matrix 
from NT empirical observations. However, depending on 
the length T of the time series being used, the Cij esti- 
mates will inevitably be corrupted by a certain amount 
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of measurement error, and this will eventually cause the 
whole correlation matrix c to be affected by the same 
problem. Several filtering recipes have been proposed in 
the statistical literature in order to partially clean corre- 
lation matrices from noise. On the other hand, a possible 
approach to attack the problem came from the Physics 
community, represented by the tools and methodologies 
developed in random matrix theory (RMT). Initially de- 
vised by Wigner [I] as a framework where to model the 
spectral properties of Hamiltionians of complex physical 
systems interacting through unknown laws, RMT grad- 
ually underwent a more formal evolution, eventually be- 
coming a mathematical theory of its own |2l3j and find- 
ing a plethora of application in extremely different sci- 
entific areas [3]. The main RMT result which is com- 
monly used in correlation data analysis is the well known 
Marcenko-Pastur distribution [5], i.e. the average eigen- 
value density for the correlation matrix of a system of 
uncorrelated Gaussian random variables in the "thermo- 
dynamic limit" N, T — > oo, with q = T/N fixed. Such 
a distribution intuitively represents a suitable candidate 
for a "null model" with no correlations. Thus, any devi- 
ation between the Marcenko-Pastur distribution and the 
empirically observed eigenvalue density of the data corre- 
lation matrix provides information about the correlation 
structure of the system under analysis. In the context of 
financial data analysis, this type of study was first carried 
out in the late nineties in |6l7j . where the spectral prop- 
erties of the correlation matrix of stocks belonging to the 
S&P500 Index were analyzed over different time scales. 
Quite surprisingly, in those works most of the eigenvalue 
spectrum was shown to be fell fitted by a Marcenko-Pastur 
distribution, whereas only few, larger, eigenvalues were 
shown to carry relevant information on the market corre- 
lation structure by "leaking out" of the Marcenko-Pastur 
region. Ever since such works, physicists kept on analyz- 
ing financial correlation matrices, constantly refining the 
general picture described in |6l7j with increasing levels of 
insight |8l9ll0lllll2ll3ll4ll5ll6ll7ll"5TT^ . and also gener- 
alizing the framework defined by equation @ to also in- 
clude the effects due to temporal correlations |2QI21j . 

A quite natural generalization of the above picture is 
represented by the extension of correlation analyses to two 
statistical systems Si and S2, both described in terms of TV 
random variables. Then, one can straightforwardly write 
down the Pearson estimator ^ for the correlation coeffi- 
cient between the ith variable in Si and the jth variable 
in S 2 : 



t=i 

Even more generally, one could think of the random vari- 
ables in Si as a set of input variables, whose output is 
in turn described by the variables in S2 (or vice versa). 
Then, it would be of great interest to further generalize 
(|5|) to the case of time lagged correlations, i.e. 



h J (T) = J ^—J2 R £ ) Ru + r, (6) 

t=l 

so that equation ([5]) is recovered for t = 0. Recovering the 
previously outlined framework, it is of course convenient 
to collect all the fcjj(r) estimates in a N x N matrix k(r). 
However, the most notable difference of such a matrix with 
respect to "ordinary" correlation matrices is that it is no 
longer symmetric, since kij(r) ^ kji(r). Hence, its eigen- 
values will in general be complex, and this feature, as we 
shall see later, will widely enrich the possible spectral anal- 
yses to be performed, and the subsequent considerations 
on the correlations between the two statistical systems to 
be studied. 

In a financial context, it is quite interesting to inter- 
pret Si and S2 as two different financial markets, so that 
the matrix k(r) will encode all of the relevant informa- 
tion on the possible correlations between them. In such 

a framework, we shall interpret as the standardized 

time t log-return of the ith stock (i = 1, . . . , N) in market 
M [M = 1,2). Log- returns are the most commonly used 
variables in financial practice, and (at time t) they are de- 
fined as log s\ A p I S\ 1 f} 1 , where S\ A P denotes the time t 
spot price of asset i in market M. 

The purpose of this paper is twofold. After briefly re- 
viewing the most relevant spectral features of asymmetric 
correlation matrices as the one introduced in equation ((5]) , 
our first goal will be to look for an empirical realization 
of this type of matrices, providing some possible method- 
ological guidelines to unravel the genuine correlations be- 
tween two distinct complex systems. As anticipated, we 
choose financial data as a case study. So, our second main 
goal will be the one of verifying whether asymmetric cor- 
relation matrices can prove to be a valuable tool for the 
description of relevant stylized facts observed in financial 
markets. Admittedly, in this respect the choice of working 
with matrices of the type ^ represents a limitation, since 
one needs the matrix k(r) to be square (so it has eigenval- 
ues), and this forces one to consider an equal number N of 
stocks in the two markets. Working with singular values, 
as in [35], removes this constraint. However, we believe 
our first, more general, goal to justify such a limitation. 

Before we start to detail our study, it is worth men- 
tioning that an analysis of financial data based on asym- 
metric matrices was first attempted in |23j . However, the 
random matrix benchmark used in that work was repre- 
sented by the Ginibre orthogonal ensemble (GinOE), i.e. 
the ensemble of random matrices with independent Gaus- 
sian real entries and no symmetry requirement. Despite 
producing complex eigenvalues, the spectral structure of 
the GinOE is completely different from the one produced 
by the random version of asymmetric correlation matrices 
as the one in equation ([5]). Thus, we believe the analyses 
to be presented in our paper to be based on more solid 
theoretical grounds. 

The paper is organized as follows. In Section [2] the 
RMT results concerning the average eigenvalue density of 
random asymmetric correlation matrices will be overviewed 
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Then, the case study on financial data will be detailed 
in Section [3l where the two-subsytems Si and S2 will be 
represented by the American and British stock exchanges, 
respectively. The empirical results discussed in Section [3] 
will be corroborated in Section@]by investigating the spec- 
tral properties of the standard Pearson correlation matrix 
of the two datasets to be used. The paper will then be 
concluded with some final remarks in Section [SJ 



2 Random asymmetric correlation matrices 

The asymmetric correlation matrix in equation © can be 
clearly written as a product of two matrices: 



kfrl 



1 



where [R, (1 ' 2) ] it 



T-t 

? (1,2) 



(7) 



R\t+i- I n t ne following, we shall con- 



sider the case in which both matrices in the right hand side 
of equation ([7]) arc random (in a sense to be made rigor- 
ous in a moment). Not many results are known on the 
spectra of products of random matrices (see for example 
[24125126127] ) as the one in equation , and most of them 
only describe "microscopic" spectral properties. However, 
in [35] an equation for the average eigenvalue density for a 
product of an arbitrary number of large Gaussian random 
matrices was derived. Such equation was derived by means 
of a planar diagram expansion (see |29| for a step by step 
introduction to this technique) under the assumption of 
all matrix dimensions going to infinity with their ratios 
kept fixed. Also, quite importantly for our present discus- 
sion, the aforementioned equation can be solved exactly 
for the product of two matrices, as in equation ([7]). More 

precisely, assuming all matrix entries in both Rq and 

(2) 

Rt to be independent and identically distributed Gaus- 
sian random numbers with zero mean and unit variance, 
the average eigenvalue density (in the complex plane) for 
the k< 12 ) matrix can be shown 1281 to be: 



Pk(A,A*) = I V(l-<?) 2 +49 2 |A| : 





for |A| < q- 1 /' 2 
for |A| > q- 1 ' 2 , 



(8) 



where again we have q^T/N and * denotes complex con- 
jugation. Thus, in the thermodynamic limit TV, T — > 00 
with q held fixed, the average eigenvalue density pk dis- 
plays circular symmetry within a circle of radius q~ x / 2 
centered in the origin of the complex plane. However, for 
any finite matrix dimension N, the circular symmetry is 
broken, due to the fact that Tr[k( 12 )(r)] is a real num- 
ber, and this introduces a constraint on the eigenvalues. 
Thus, for any finite N an excess of eigenvalues lying on 
the real axis, which can be shown to decrease as y/~N [50] . 
can be observed (see Figure [T]). When considering com- 
plex rather than real entries for k^ 12 ', circular symmetry 
is recovered also for finite values of N. Since the leading 
order (in A^) results obtained for the eigenvalue densities 
with real and complex entries coincide, when taking the 




Fig. 1. Eigenvalues of 50 random asymmetric correlation 
matrices with N = 100 and T = 500. 



infinite matrix size limit one eventually ends up with the 
density in equation ([5]) in both cases. 

Given the circular symmetry, one can safely work with 
the radial eigenvalue density derived from ([5J , which reads 
Pk d { x ) = 27ra;pi c (A, A*)||A|=a;- Now, the thermodynamic 
limit density ([5]) reaches a finite value at the boundary 



of its domain (|A| 



q 



-1/2 



), and then abruptly becomes 



equal to zero. However, when working with finite sized 
matrices, this transition is smoothed according to the fol- 
lowing damping (conjectured in [2<3] , inspired by analogous 
finite size corrections that can be introduced rigorously for 
the Ginibre random matrix ensembles |4I31| , and actually 
proved in [27]): 



1 



-1/2 



))• 



(9) 



where the parameter h is phcnomcnological and needs to 
be adjusted by fitting. See Figure[2]for an example: as can 
be seen, the excess of eigenvalues on the real axis almost 
docs not affect the overall shape of the radial density, even 
for relatively small matrix dimensions. Thus, in all of our 
following analyses we shall freely compare empirical data 
with the density in equation (0). 



3 Empirical analysis 

In this section we shall look for an empirical realization 
of the asymmetric correlation matrix |6j in a financial 
context. Namely, as already anticipated, in the following 
we shall consider two different financial markets as the two 



statistical systems from which the data R\l and R 



? (2) 

'j,t+T 



(see again equation ^) are drawn from. In particular, we 
shall focus on the American and British financial markets 
by employing prices of stocks belonging to the S&P500 
Index and the FTSE350 Index. 

The dataset to be used is made of daily prices of A" = 
200 stocks (from both markets, so 400 stocks overall) cov- 
ering the years 2005-2011 (T = 1595 log-returns). It is 
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Fig. 2. Radial density corresponding to the eigenvalues 
in Figure [T] fitted with the effective finite size density of 
equation © (finding h = 27.9). 

important to remark here that, in order to empirically 
recreate the correlation matrix ((6]) (especially for r = 0, 
as in equation ([5)1 ) it is mandatory to work with data well 
defined on the same time steps t — f , . . . , T. For this very 
reason, prices collected from the American market during 
British holidays (and vice versa) were removed from the 
datasets. 

When actually computing the eigenvalue spectrum of 
the generalized correlation matrix © for the aforemen- 
tioned S&P500 and FTSE350 datasets, two main features 
can be clearly distinguished: a main eigenvalue bulk close 
to zero and one large (in modulus) eigenvalue. We shall 
separately discuss those two aspects. 

3.1 The largest eigenvalue 

In the following, the variables in equation ^ will 
be meant to be the log-returns of stocks belonging to the 

(2) 

S&P500 Index, whereas the variables R\ { +T represent log- 
returns of stocks belonging to the FTSE350 Index. 

In Figure [3] the largest (in absolute value) eigenvalue 
I Amax | is plotted as a function of r (blue solid line). It is 
worth remarking that, except for a few cases, such eigen- 
value is always found to be real. Intuitively, this is because 
it actually accounts for most of the trace of the k(r) ma- 
trix, which is a real number too. Now, as one can see from 
Figure [3l the largest values of |Amax| are found for r = 
0, 1. More specifically, in both such cases Amax is real and 
we have Amax(t = 0) = 36.4 and Amax(t = 1) = 23.3. 
Quite interestingly, one finds Amax(t = — 1) = 3.1, much 
smaller than Amax(t = !)■ This asymmetry highlights 
(also in the light of the interpretation of Amax as aver- 
age correlation to be discussed in the following) a strong 
influence of past American stock prices on the following 
day's British stock prices. 

In order to verify the robustness of such evidence, we 
also computed the values of Amax for r = 0, ±1 over eight 




-30 -20 -10 10 20 30 



T (days) 

Fig. 3. Absolute value of the largest eigenvalue Amax of 
the asymmetric correlation matrix k(r) as a function of 
r. 

different portions of our datasets (all of them made of 1195 
daily log-returns and starting at t = 1, 50, 100, . . . , 350). 
Over such eight samples we find, for r = 0, an aver- 
age value of \max{t = 0) = 37.3 with a standard de- 
viation g(t — 0) = 1.5, whereas for r = ±1 we find 
Amax(t — 1) — 24.1 and A M ax(t = -1) = 3.8 with 
standard deviations a(r = 1) = 0.1 and er(r = —1) = 0.4. 
In some cases, such estimates only appear to be close, but 
not perfectly compatible with the ones shown previously 
for the whole dataset. However, this fact does not point 
out any inconsistency, since the average and standard de- 
viation values we reported are computed over (sometimes 
largely) overlapping time windows. So, they are not to be 
considered for any serious statistical comparison, and are 
only meant to qualitatively show how the estimates for 
Amax fluctuate over time. 

For values of r other than and ±1, |Amax| seems 
to follow a random path, approximately lying between 
and 10. The interesting point, however, is that Amax 
is very often found to be much larger than the limiting 
radius predicted by RMT for the eigenvalue density of 
random asymmetric correlation matrices. As already de- 
tailed in the previous section, such a radius is equal to 
q- 1 ' 2 = y/N/T (see equation ©). With the values of N 
and T of our dataset we have R ~ 0.35, much smaller 
than most values of |Amax|- At first, this might seem to 
suggest the existence of some non trivial long-range corre- 
lation. On the contrary, such persistently high values can 
be shown to be a spurious effect by means of the following 
argument. Let k(r) be the average estimated correlation 
between stocks in the two markets, i.e. 

1 N 

~ k ( T ) = N2 £Mr) (10) 

with kij(r) defined as in equation ([5]). Let us then ap- 
proximate the whole matrix as k(r) ~ fc(r)Ejv, where E^v 
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is the N x N matrix whose entries are all equal to one: 
this amounts to approximate all correlations in k(r) with 
their average. Now, it can be easily shown that the matrix 
E jv has one eigenvalue equal to N and N — 1 eigenvalues 
equal to zero. Under such a "mean field" approximation 
the eigenvalue spectrum of k(r) would read 

det(k(r) - Aljy) ~ det(fc(r)Ejv - XI N ) (H) 
= (-X) N - 1 (Hr)N-X), 

where ljv represents the N x N identity matrix. Equa- 
tion (fTTj) means that we would have N — 1 zero modes 
plus one eigenvalue equal to k(r)N. Quite remarkably, this 
simple and apparently very rough approximation is actu- 
ally enough to explain the persistence of a large eigenvalue 
over large time lags: the red dashed line in Figure [3] repre- 
sents \k(r)\N, and one can see how close this follows the 
path of the largest eigenvalue |Amax|- All in all, this latter 
merely reflects the average correlation for a certain value 
of the time lag r. Most importantly, this is also true for 
t = 0,1, i.e. when |Amax| reaches its highest measured 
values, and such evidence tells us, unsurprisingly, that the 
average correlations arc much higher for those values of 
t. For other values of r, the absolute value of the average 
correlation approximately lies between and ±0.04 (i.e. 
very small values), but the enhancing factor N causes the 
corresponding large eigenvalue \k(r)\N to lie between 
and 10, as already stated. In the genuinely random matrix 
model for k(r) outlined in Section [2] the average correla- 
tion k(r) is very strongly suppressed, so that no large and 
isolated eigenvalues can appear. 



3.2 Bulk of the spectrum 

In Figure 0] the main part of the radial eigenvalue spec- 
trum of the the k(r) matrix constructed with the afore- 
mentioned S&P and FTSE datasets is plotted (blue dots) 
for t = and r = 100. In order to improve the statistics, a 
bootstrap approach is followed: namely, 200 iterations are 
performed and, for each of those, the k(r) matrix is con- 
structed by randomly selecting 190 stocks out of the 200 
available ones for each of the two stock sets. This is done 
under the reasonable assumption that the eigenvalue spec- 
trum will not be drastically affected, at least in its overall 
appearance, by the particular stock selection. Also, in both 
plots of Figure SI the effective radial eigenvalue density (O 
predicted by RMT for N = 190 and T = 1595 (T = 1495 
for the case r = 100) is shown. In both cases the h param- 
eter was determined by fitting on Monte Carlo densities 
with very large statistics. As is immediate to see, for both 
of the considered values of r the empirical and theoretical 
densities have no similarity at all. Also, trying to fit the ef- 
fective density ([§]) allowing the q ratio to be a free parame- 
ter (much in the same spirit of what was done in |6I7| with 
the Marcenko-Pastur density) does not provide acceptable 
results, essentially due to to the much slower falloff of the 
empirical densities with respect to the exponential one of 



the RMT radial density At first, one might naively in- 
terpret such discrepancies, especially the one for r = 100, 
as a sign of some long-range time correlations between the 
markets under study. However, one should recall that the 
RMT densities in equations ((SJ) and © are derived for 
the k(r) matrix in ((5]) under the assumption that the two 
sub-systems have no mutual correlation and no correla- 
tion of their own. This is a crucial point: using a large 
time lag r should suppress all correlations between the 
stocks in the two datasets, and this is actually confirmed 
by the previous analysis on the r-dcpcndcncc of the aver- 
age correlation k. However, using a sliding time window 
docs not suppress the self-correlations within each mar- 
ket: figuratively speaking, those are "dragged along" by 
the sliding window t itself. Hence, one should try to dis- 
entangle the two different types of correlations, getting rid 
of the inner ones while retaining only those existing be- 
tween the two sub-systems. Quite naturally, this task can 
be accomplished by mapping the original variables onto 
the corresponding sets of principal components. 

3.3 Mapping onto principal components 

Starting from our two datasets, let us construct their stan- 
dard Pearson correlation matrices in the usual way as 
C« = R( 1 )(R( J )) T /T and C< 2 > = R( 2 ) (R 2 ) T /T. Denot- 
ing their eigenvalues as Ai^ and X2J (for i, j = 1, . . . , N), 

and the corresponding eigenvectors as V^ 1 '^ = (v} 1 '^ , . . . , V^'^) 

and V^ 2j - 1 = (V]_ \ . . . , Vjy '^), principal components are 
defined as follows: 

1 N 

V 2 = 1 

where M = 1,2 and and are as in equation ([5]). 
Now, exploiting eigenvector orthogonality one can imme- 
diately verify that principal components are exactly un- 
correlated: 

Moreover, inverting equation (|12[) one can expand any of 
the original variables in terms of principal components: 

4 M) =EVA^^ (MJ) er- (14) 
i=i 

This relation is exact and shows that any of the random 
variables R[f^ can be decomposed over a set of uncor- 
rclatcd variables, whose explanatory power (in terms of 
variance) of the original variables' dynamics can be ranked 
depending on the size of the corresponding eigenvalues. 

Principal components look as a quite appealing set of 
variables to use in the framework of asymmetric correla- 
tion matrices between two distinct financial markets. As 
already stated, the huge and persistent (over large time 
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Fig. 4. Eigenvalue spectrum of the asymmetric correlation matrix k(r) for stocks belonging to the S&P500 and 
FTSE350 indices (blue dots), with N = 190, T = 1595 and r = (left), T = 1495 and r = 100 (right). The spectrum 
statistics is enhanced by means of a bootstrap approach (see main text). The solid line represents the effective radial 
density predicted by RMT (equation ([9])) for the same values of N and T . The h parameter was adjusted by fitting 
on a Monte Carlo density with large statistics, yielding h = 51.80 and h = 50.86 in the two cases. 



lags) deviations between empirical spectra and RMT pre- 
dictions seem to be due to the inner correlations of the two 
markets. Switching to principal components circumvents 
this problem. Let us then introduce the asymmetric cor- 
relation matrix between principal components of the two 
datascts in use. We shall write the correlation coefficients 



as 



T -t 



E 



p (l) p (2) 



(15) 



and we shall collect them in a matrix k( e ) (r). So now, since 
the principal components in each set are completely un- 
correlated, any deviation of the k^ e ' (r) matrix's eigenvalue 
spectrum from the pure noise RMT prediction can only 
be imputed to correlations between the two sub-systems 
under study, encoded as correlations between their respec- 
tive principal components. Even more interestingly, as is 
quite well known, the first few principal components, i.e. 
the dominant ones related to the largest eigenvalues, can 
be given a simple financial interpretation (see for exam- 
ple [H]): the first one arises as a consequence of collective 
market fluctuation (hence it is usually given the name of 
"market mode"), and the first few after that generally 
correspond to market sectors. Hence, before studying the 
whole spectrum of the k^ e ) (r) matrix, let us take a look at 
the correlations between such variables. In Figures [5] and 
[5] the r-dependence of some matrix elements in k^ e '(r) is 

shown. Namely, in Figure[S]thc correlation coefficients 

and k^ between the two main principal components (i.e. 
those related to the two largest eigenvalues) in the two 
datasets is plotted. Such principal components account 
for 46.5% of the overall data variance in the S&P dataset, 
and for 31.7% in the FTSE dataset. As one can see, quite 
strong correlations (cither positive or negative) are again 




Fig. 5. Correlation coefficients k[\ (r) (solid line) and 
&22 ( r ) (dashed line) . 




Fig. 6. Correlation coefficients &^( r ) (solid line) and 
( r ) (dashed line) . 



found for t = 0, 1: fcn(r = 0) = 0.54, ku{r = 1) = 0.34 
and fc 22 (r = 0) = -0.31, k 22 {T = 1) = -0.29. For dif- 
ferent values of r, much smaller values are found, simi- 
larly to the case of the largest eigenvalue (see Figure [3]). 
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On the contrary, correlations between the first and sec- 
ond principal components in the two datasets, encoded 
in the matrix elements and /c2i' are found to be quite 
small for all values of t (see Figure 15]). Similar facts, i.e. 
strong "diagonal" correlations for r = 0, 1 and weak "off- 
diagonal" correlations, are observed also when consider- 
ing the other most relevant principal components. This is 
quite interesting since, apart from the first component 

(2) 

and e\ , which represent market modes, the other most 
relevant principal components do not necessarily repre- 
sent one well-defined market sector or the same sector in 
the two markets. Nevertheless, their quite strong mutual 
correlations for r = 0, 1 suggest that they encode relevant 
information about "orthogonal" (in the sense made rigor- 
ous by PCA) market portions, which remain "orthogonal" 
across different financial markets (as demonstrated by the 

small "off-diagonal" correlations fey (t) for i ^ j). 

As a concluding remark to this discussion, let us also 
clarify how the correlations between different principal 
components impact those between the "true" , original vari- 
ables (daily log-returns in our case) . Starting from the cor- 
relation coefficient ((HJ) , and using equation ([T4"|) , one finds 

Mr) = j^-E^MS+r ( 16 ) 
(=1 

! T-t 1 

t=l Z, s =l 

N 
Ls=l 

and from this relation one sees that, unsurprisingly, the 
largest eigenvalues and largest correlations between prin- 
cipal components justify, for most part, the correlations 
between the original variables. Defining two N x N ma- 
trices WW and W( 2 ' with entries 

< f) = ^ (17) 

(where M = 1,2) allows us to rewrite equation (|16|) in 
matrix form: 

k(r) = WW^'M (W( 2 ') T . (18) 

We shall come back later to this point. 

Finally, let us look at the eigenvalue spectrum of the 
asymmetric correlation matrix k^ e ' (r) of the principal com- 
ponents. In Figure [7] the empirical radial eigenvalue spec- 
tra of the k( e )(r) matrix are plotted for r = 0, 1, 30 (top- 
left, top-right and bottom, respectively). In all cases, the 
same bootstrap approach already adopted for the spec- 
tra in Figure H] is used, i.e. 200 iterations are performed, 
each time randomly selecting N = 190 stocks out of the 
200 available ones in each dataset. As can be seen, for all 
values of r one now ends up with an eigenvalue spectrum 
which is much closer to the one predicted by RMT (solid 



line in all plots of Figure [JJ) than when using the original 
variables (Figure EJ. For t = 0, 1 significant correlations 
between the two markets under study exist, as pointed out 
in the previous analyses, and this is reflected into visible 
deviations between the empirical and the theoretically ex- 
pected eigenvalue density. For larger values of r (exempli- 
fied by the bottom plot of Figure UJ) the overall agreement 
improves: the exponential falloff of the RMT density is 
quite well reproduced (whereas for r = 0, 1 this is not 
the case), but still an excess of eigenvalues lying around 
the peak region of the distribution can be clearly seen. 
However, even though the agreement between data and 
theory is still not excellent even after switching to prin- 
cipal components, the main point to be discussed is the 
following: namely, all the theoretical densities in Figure [JJ 
are fitted to the empirical histograms, allowing both h and 
q (see equation ©) to be free parameters. Now, whereas 
the former parameter is phenomenological by definition, 
the latter should in principle be given by the ratio T/N. 
The values of N and T used in Figure [JJ give q ~ 8.4 while 
by fitting one obtains q = 5.59, q = 5.64 and q = 6.08 
for r = 0, 1, 30 respectively: in all cases the effective q pa- 
rameter is very different with respect to its expected value. 
Moreover, one can also check that by performing one same 
time reshuffling for all the e^s and another one (different 
from the first) for all the e^s, the expected value of is 
essentially reached (see Figure [FJ where the radial density 
© is fitted giving q = 8.24 when r = and q = 8.15 when 
t = 30, very close to the "natural" value q ~ 8.4. So, how 
to interpret this result? 

Performing one time reshuffling within one dataset and 
a different one within the other one has the following ef- 
fects on the different types of correlations involved: 

— Performing one same reshuffling for all the variables 
within one set keeps their mutual cross-correlations in- 
tact. Since the variables being dealt with here are prin- 
cipal components, this kind of reshuffling keeps them 
uncorrelated (see equation (J5]) ). 

— Since the two rcshufflings performed on the two datasets 
are different, all correlations between variables belong- 
ing to different sub-systems are destroyed. 

— Performing a time reshuffling on a time series reason- 
ably destroys all possible autocorrelations in it. 

As a matter of fact the first two points in the above list 
empirically recreate the conditions under which the RMT 
density ([9]) is derived, i.e. no self-correlations within each 
system and no correlation between the two. However, such 
conditions are essentially obtained also when the corre- 
lations amongst principal components are computed for 
large enough values of r (see Figures [S] and O , whereas 
the example shown in Figure [JJ shows that this is not the 
case, since one ends up with an effective value of q which 
is quite far from the expected one. So, the last point in 
the above list appears to be the crucial one. 

Finding smaller values of q, with respect to the "natu- 
ral" ones, as in Figurc[JJ amounts to larger effective values 
of N or smaller effective values of T, and the latter seems 
the only possible option. As a matter of fact, principal 
component analysis grants us that no cross-correlations 
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Fig. 7. Empirical eigenvalue spectra (enhanced by bootstrap) of the k( e )(r) correlation matrix (|T5|) of principal 
components for r = (top- left), r = 1 (top- right) and r = 30 (bottom) fitted with the radial density © (solid line). 




Fig. 8. Empirical eigenvalue density of the asymmetric correlation matrix of principal components (equation ([TSjl'l 
after performing one same time reshuffling on all the stocks in the S&P dataset and another one on all stocks in the 
FTSE dataset (see the main text for more details on this). The left plot refers to the case r = 0, while in the right 
plot we have r = 30. 
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Fig. 9. Autocorrelation function of the first two principal components of the S&P (left plot) and FTSE (right plot) 
datasets. In both plots, solid lines refer to the first principal component, while dashed lines refer to the second one. 
The region delimited by the red horizontal lines represents the 99.7% confidence interval for the autocorrelation of a 
purely random process. 



exist between principal components (equation (|13[1 ). Nev- 
ertheless, nothing prevents such variables to display au- 
tocorrelations, contrarily to the original variables, i.e. the 
log-returns, which are known not to display any relevant 
autocorrelation (see for example [35]). In this respect, see 
Figure IH1 where the autocorrelations (as a function of r) of 
the first two principal components for each of our datasets 
are plotted; the 99.7% confidence interval for a purely ran- 
dom process of length T is shown in red. As can be clearly 
seen, the interval boundaries are crossed several times, 
thus illustrating that, indeed, the main principal compo- 
nents do feature autocorrelations (similar behaviors are 
found for all other principal components). On a qualitative 
level, the presence of autocorrelations reduces the number 
of degrees of freedom in the system, and justifies the need 
to accordingly adjust T to an effective dimensionality |33j . 



4 Joint correlation matrix 

Before concluding, let us complement our analyses on asym- 
metric correlation matrices by studying the standard cor- 
relation matrix of our whole dataset. Let us then consider 
the following 2N x T matrix: 



R 



R« 

R( 2 ) 



(19) 



where R^ and R^ 2 ) are two N x T matrices containing 
the time series of our S&P and FTSE datasets, respec- 
tively. From the matrix in equation (|19|) . we can build the 
ordinary Pearson correlation matrix as in equation (U): 




Fig. 10. Eigenvalue spectrum of the joint correlation ma- 
trix in equation (|20"]) . For better visualization, the two 
largest eigenvalues, equal to 112.7 and 31.8, have not been 
plotted. 



The eigenvalue spectrum of the joint correlation ma- 
trix in equation ([2"0")) displays one main bulk (see Figure 
T0|) . plus a few eigenvalues "leaking out" of such bulk. 
Some of those can already be seen in Figure [TUl but not 
the largest two, equal to Ai = 112.7 and A2 = 31.8, i.e. 
much larger than all the remaining ones. Very interest- 
ingly, some intuition on the meaning of such eigenvalues 
can be grasped by means of principal component analy- 
sis. Let us denote the eigenvalues of the joint correlation 
matrix c as Ai > A2 > . . . > A2JV, and the correspond- 
ing normalized eigenvectors as = (V^, . . . , V 2 ^), for 
i = 1, . . . , 2N. Denoting principal components as e^, equa- 
tion (TlT| can be specialized to the present case by writing 



2N 



Kit - ^2 



(21) 



c = — RR 

T 



r' 1 '(r' 1 >) t r< 1 '(r' 2 ') t 
r' 2 )(r.< 1 )) t r' 2 '(r.< 2 ') t 

T T 



(20) 



So, the asymmetric correlation matrix (for t = 0) k = 
(R( 2 )) T /T and its transpose are embedded as the off- 
diagonal blocks of a larger object, which we shall call joint 
correlation matrix, having real eigenvalues. 



In the above equation, values of the index i going from 1 
to N cover stocks belonging to the S&P Index, while val- 
ues going from N + 1 to 2N refer to stocks in the FTSE 
Index. Given the above considerations on the eigenvalue 
spectrum of the c matrix, it is certainly interesting to 
look at the eigenvector components of and V^ 2 \ i.e. 
the eigenvectors corresponding to the largest eigenvalues. 
In Figure Qj] the components of are reported, dis- 
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Fig. 11. Component distribution for the eigenvector V^ 1 ) 
related to the largest eigenvalue Ai of the joint correlation 
matrix in equation (|20[) . The distribution of components 
related to S&P stocks is plotted with the solid line, while 
the one of components related to FTSE stocks is plotted 
with the dashed line. 




Fig. 12. Component distribution for the eigenvector V^ 2 ) 
related to the second largest eigenvalue A2 of the joint 
correlation matrix in equation (|20[) . The distribution of 
components related to S&P stocks is plotted with the solid 
line, while the one of components related to FTSE stocks 
is plotted with the dashed line. 



tinguishing those related to S&P stocks (solid line) from 
those related to FTSE stocks (dashed line). As one can 
see, both component groups are positive and they partially 
overlap. Thus, from equation (|2ip one can conclude that 
the first principal component of the c approximately im- 
pacts all stocks in the same way. On the contrary, one can 
see in Figure [12] that the eigenvector components of V^ 2 ) 
are split into two well separated groups: components re- 
lated to S&P stocks are positive, while component related 
to FTSE stocks are negative. Also, one can verify that the 
component distributions for all the remaining eigenvectors 
(from V^ 3 ) to V"( 2Ar )) almost exactly overlap. These facts 
suggest the following interpretation. The largest eigen- 
value Ai is a "global market" eigenvalue, meaning that the 
corresponding principal component, accounting for 28.2% 
of the overall data variance, drives both markets in the 
same direction (all V- s positive), and roughly drives all 
of their stocks with the same intensity (partial overlap of 
the two distributions in Figure QT]) . On the other hand, 
Figure [T^] makes it clear that the principal component 



Fig. 13. Component distribution of the eigenvectors VW 
of the joint correlation matrix in equation (|20j) for i = 
3, . . . , 2N. Blue dots refer to components related to S&P 
stocks, whereas red crosses refer to to components related 
to FTSE stocks. 

related to the second largest eigenvalue, accounting for al- 
most 8% of the overall data variance, is the main source of 
negative correlation between the two markets under study. 
Those observations essentially match the results discussed 
in Section [3.31 In particular, in Figure[5]it was shown that 
the main principal components of the two markets are 
strongly correlated for t = 0, whereas their second most 
relevant principal components are negatively correlated. 
So, both analyses point out two main sources of corre- 
lation, one positive and one negative, between the two 
markets. The remaining eigenvalues of c do not allow for 
similarly clear interpretations, and this is quite well por- 
trayed by the almost overlapping eigenvector component 
distributions shown in Figure 1131 



5 Conclusions 

In very general terms, the main motivation for the study 
presented in this paper was to look for an empirical re- 
alization of a random asymmetric generalized correlation 
matrix of the type and its eigenvalue density (equa- 
tions §E§ and ©), attempting to perform a correlation 
analysis with complex eigenvalues. Financial data were 
chosen as a case study, but all of the analyses performed 
could be exactly replicated in any context where time se- 
ries are involved. 

As already stated, looking at eigenvalues might repre- 
sent a limitation, since it forces one to work with square 
matrices. From the financial viewpoint, this limitation forced 
us to work with equal number of stocks in the S&P and 
FTSE datasets. Drawing more significant conclusions on 
the possible correlations between the two whole indices (or 
markets) would require to keep the datasets to their actual 
dimensions, and consequently to work with singular val- 
ues, as in [53] . Still, whenever one is reasonably allowed to 
work with an approximately close number of variables in 
the two sub-systems, the radial density ([§]) represents an 
effective tool to detect the presence of cross-correlations 
(as in Figure |4]) or autocorrelations almost at first glance, 
or at least after a quick fitting procedure to determine the 
effective value of the q ratio. 
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As far as financial aspects are concerned, all the results 
we presented suggest that all macroscopically relevant cor- 
relations between the New York and London stock ex- 
changes expire within a 24 hours time window. Switching 
to principal components and studying the spectral prop- 
erties of the joint correlation matrix ([2U]) allowed us both 
to corroborate such findings and to unravel some other 
non trivial facts, such as the identification of the main 
sources of positive and negative correlation between the 
markets we considered, and the emergence of an effective 
system dimensionality due to autocorrelations in the prin- 
cipal components. Also, it would definitely be interesting 
to repeat all or some of the analyses detailed in this paper 
on high frequency data, possibly comparing the results 
with those presented in the previously mentioned paper 

Lastly, from the viewpoint of RMT, equation (jT5J) rep- 
resents a very interesting starting point for possible fu- 
ture developments. More specifically: principal component 
analysis grants us that the variables which give rise to 
the k( e )(T) matrix are exactly uncorrelated within each 
sub-system. So, whenever those are reasonably well de- 
scribed by Gaussian statistics, we know that the aver- 
age eigenvalue density of k( e ) (r) is given by equations ([8| 
and ^ (possibly for some effective value of q, as we dis- 
cussed). Thus, equation (JTSJ) describes the transition from 
the eigenvalue density arising from two uncorrelated sys- 
tems (encoded in k^ e ^(r)) to the one of two systems having 
the correlation structure encoded in the and W 1 - 2 ' 

matrices (see equation (fl~T]0 . This is an interesting prop- 
erty at least for the following reason. As far as theoretical 
advances in RMT arc concerned, one could try to use re- 
cently developed tools about the multiplicative structure 
of random matrices [34] in order to derive analytical, or 
semi-analytical, results for the spectrum of the k(r) ma- 
trix seen as the outcome of the multiplicative action of 
two fixed known matrices (W^ 1 ' and W' 2 )) on a known 
spectrum (the one given by k( e ) (t)). Also, generalizing the 
results in equations (J8j) and ([9]) to the eigenvalue spectra 
of asymmetric correlation matrices arising from random 
variables displaying both cross-correlations and autocor- 
relations would represent a major challenge to RMT de- 
velopments. However, intuition based on similar general- 
izations for ordinary correlation matrices (see for example 
[2"0] ) suggests that the presence of short lived, e.g. expo- 
nentially damped, autocorrelations would not modify the 
eigenvalue spectra in a dramatic fashion. 

We thank Guido Montagna for helpful suggestions and for 
reading the preliminary version of our manuscript. G. L. 
also wishes to thank Oreste Nicrosini and Andrea Schirru 
for many stimulating discussions during the early stages 
of this work. 
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